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Abstract 

We consider source coding with fixed lag side information at the decoder. We 
focus on the special case of perfect side information with unit lag corresponding to 
source coding with feedforward (the dual of channel coding with feedback) intro- 
duced by Pradhan [1] . We use this duality to develop a linear complexity algorithm 
which achieves the rate-distortion bound for any memoryless finite alphabet source 
and distortion measure. 



1 Introduction 

There is a growing consensus that understanding complex, distributed systems requires 
a combination of ideas from communication and control [2]. Adding communication con- 
straints to traditional control problems or adding real-time constraints to communication 
problems has recently yielded interesting results [3-7]. We consider a related aspect of 
this interaction by exploring the possible advantages that the feedback/feedforward in 
control scenarios can provide in compression. Specifically, we explore a variant of the 
Wyner-Ziv problem [8] where causal side information about the source is available with 
a fixed lag to the decoder and explore how such side information may be used. 

For example, consider a remote sensor that sends its observations to a controller as 
illustrated in Fig. The sensor may be a satellite or aircraft reporting the upcoming 
temperature, wind speed, or other weather data to a vehicle. The sensor observations 
must be encoded via lossy compression to conserve power or bandwidth. In contrast to 
the standard lossy compression scenario, however, the controller directly observes the 
original, uncompressed data after some delay. The goal of the sensor observations is to 
provide the controller with information about upcoming events before they occur. Thus 
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Figure 1: A sensor compresses and sends the source sequence xi, X2, . . ., to a controller 
which reconstructs the quantized sequence xi, X2, • . ., in order to take some control action. 
After a delay or lag of A, the controller observes the original, uncompressed data directly. 



at first it might not seem that observing the true, uncompressed data after they occur 
would be useful. Our main goal is to try to understand how these delayed observations 
of the source data (which we call side information) can be used. Our main result is that 
such information can be quite valuable. 

The following toy problem helps illustrate some relevant issues. Imagine that Alice 
plays a game where she will be asked 10 Yes/No questions. Of these questions, 5 have 
major prizes while the others have minor prizes. After answering each question, she is 
told the correct answer as well as what the prize for that question is and receives the prize 
if she is correct. Bob knows all the questions and the corresponding prizes beforehand 
and wishes to help Alice by preparing a "cheat-sheet" for her. But Bob only has room 
to record 5 answers. Is there a cheat-sheet encoding strategy that guarantees that Alice 
will always correctly answer the questions with the 5 best prizes? No such strategy exists 
using a classical compression scheme. Instead, as illustrated in Section |HJ the optimal 
strategy requires an encoding which uses the fact that Alice gains information about the 
prize after answering. 

In the rest of the paper, we study the fixed lag side information problem. Since solving 
the general problem seems difficult, we begin by focusing on perfect side information 
with a unit lag. This special case is the feedforward source coding problem introduced 
by Pradhan [1] and is dual (in the sense of [9,10]) to channel coding with feedback. By 
using the feedforward side information, it is possible to construct low complexity source 
coding systems which can achieve the rate-distortion bound. Specifically, [1] describes 
how to adapt the Kailath-Schalkwijk scheme for the Gaussian channel with feedback [11] 
to the Gaussian source squared distortion scenario with feedforward side information. In 
this paper, we consider finite alphabet sources with arbitrary memoryless distributions 
and arbitrary distortion measures. Since Ooi and Wornell's channel coding with feedback 
scheme [12] achieves the best error exponent with minimum complexity, we investigate 
the source coding dual. Specifically, we show that the source coding dual of the Ooi- 
Wornell scheme achieves the rate-distortion bound with linear complexity. 

We begin by describing the problem in Section |21 Next we present a simple example 
of how feedforward side information can be useful in the binary erasure quantization 
problem in Sectional In Section^ we consider the more complicated example of quan- 
tizing a binary source with respect to Hamming distortion. We present our source coding 
algorithm for general sources in Section |5] and show that it achieves the rate-distortion 
function with low complexity. We close with some concluding remarks in Section |U1 

2 Problem Description 

Random variables are denoted using the sans serif font (e.g., x) with deterministic values 
using serif fonts (e.g., x). We represent the ith element of a sequence as Xj and denote a 
subsequence including elements from i to j as x\. 

We consider (memoryless) source coding with fixed lag side information and represent 
an instance of the problem with the tuple (X , W,p X)W , d(-, ■), A) where X and W represent 
the source and side information alphabets, p XtW (x,w) represents the source and side 
information joint distribution, d(-, ■) represents the distortion measure, and A represents 
the delay or lag. Specifically, the source and side information each consist of a sequence 
of n random variables xj 1 and w/j 1 taking values in X and W generated according to the 
distribution p x ™, w ^(x1) = Yli =1 p x ,w(xi,Wi). 

A rate R encoder, /(•), maps x™ to a bit sequence represented as an integer m £ 
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{1, 2, . . . , 2 nR }. The corresponding decoder works as follows. At time i, the decoder 

takes as input m as well as the side information samples, w[~ A , and produces the iih 
reconstruction Xj. A distortion of d{x i) x i ) is then incurred for the iih sample where d(-, •) 
is a mapping from X x X to the interval [0, d max \. 

The basic problem can be specialized to the original (non-causal) Wyner-Ziv problem 
[8], by allowing a negative delay A = — oo. Similarly, setting A = yields a causal version 
of the Wyner-Ziv problem. Finally, letting the side information be exactly the same as 
the source with a positive delay yields the feedforward source coding problem studied 
in [1]. For all these cases, the goal is to understand the fundamental rate-distortion- 
complexity performance. To show that the benefits of fixed lag side information are 
worth investigating, we focus on the feedforward case where w = x with unit delay 
A = 1 throughout the rest of this paper. 

For memoryless sources, the information feedforward rate-distortion function, (D), 
is defined to be the same as Shannon's classical rate-distortion function: 

Pg\ x :E[d(x,x)]<D 

The operational feedforward rate-distortion function, Rf(D), is the minimum rate re- 
quired such that there exists a sequence of encoders and decoders with average distor- 
tion, - Ym=i d(xi, Xj), asymptotically approaching D. As observed in [1] and shown in the 
appendix, the information and operational feedforward rate-distortion functions are the 
same. Thus feedforward does not reduce the rate required, but as we argue in the rest 
of this paper, it allows us to approach the rate-distortion function with low complexity. 



3 Example: Binary Source & Erasure Distortion 

The simplest example in channel coding with feedback is the erasure channel and in this 
case the algorithm in Fig. |21 achieves capacity. At time 1 the encoder puts message bit mi 
into the channel. If it is received correctly, the transmitter then transmits m 2 , otherwise 
mi is repeated until it is successfully received. The same process is used for m 2 , m 3 , etc. 
For example, to send the message 0101 though a channel where samples 2, 3, 6, and 7 are 
erased, the transmitter would send 01110111 and the receiver would see * *10 * *1. In 
general, if there are n message bits, mi, m 2 , . . ., m n , and e erasures, then exactly n + e 
channel uses are required. This yields a transmission rate of nj (n + e) which is exactly 
the channel capacity. 

The dual to the binary erasure channel (BEC) is the binary erasure quantization 
problem (BEQ). In the BEQ, each source sample can be either 0, 1, or * where * represents 
"don't care" . The distortion measure is such that and 1 cannot be changed but * can 
be quantized to either or 1 with no distortion. The BEQ models the game introduced in 
the introduction. 1 To develop a source coding with feedforward algorithm for the BEQ, 
we can dualize the channel coding with feedback algorithm for the BEC as illustrated in 
Fig. El 

Assume the source is xf = * *10 * *1. The encoder compresses this to m — 0101 by 
ignoring all the * symbols and sends this to the receiver. At time 1, the receiver chooses xi 

1 Yes/No answers for questions with major prizes map to 1/0 values for the source while questions 
with minor prizes map to a value of * for the source. The distortion measure represents the restriction 
that questions with major prizes must be answered correctly while the answers for the other questions 
are irrelevant. 



3 



increase 1 



I 



Send rrii 
through 
channel 










increase i 




rhi = z 




Yes 



z = next 

channel 

sample 





Figure 2: Encoder (left) for transmitting a message m = m 1? m 2 , . . . across an erasure 
channel with feedback and decoder (right) for producing an estimate of the transmitted 
message rh. 
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Figure 3: Encoder (left) and decoder (right) for the binary erasure quantization problem 
which is dual to the binary erasure channel. 

to be the first bit in the encoding (i.e., x\ = m x = 0). From the feed-forward the receiver 
realizes this is correct after it makes its choice. Next, the receiver chooses x 2 to be the 
next bit received (i.e., x 2 = m 2 = 1). After choosing this reconstruction, the receiver is 
told that in fact x 2 = * so even though x 2 ^ x 2 , no distortion is incurred. At this point, 
the receiver realizes that m 2 must have been intended to describe something after x 2 . So 
at time 3, the receiver chooses x 3 = m 2 = 1. Once again the receiver learns that this is 
incorrect since X3 = *, but again no penalty is incurred. Again, the receiver decides that 
m 2 must have been intended to describe something else so it chooses x 4 = m 2 = 1 at 
time 4. This turns out to be correct and so at time 5 the receiver chooses x 5 = m 3 , etc. 

In the encoder /decoder described above, the encoder sends the non-erased bits of x™ 
and the decoder tries to match up the compressed data to the source. This system yields 
distortion provided that at least n — e bits are sent where e denotes the number of * 
symbols in the source vector. It is straightforward to show that no encoder /decoder can 
do better for any value of nor e. A system not taking advantage of the feedforward could 
asymptotically achieve the same performance but it would require more complexity and 
more redundancy. Thus just as in the erasure channel with feedback, we see that for the 
erasure source, feedforward allows us to achieve the minimum possible redundancy with 
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minimum complexity. 



4 Example: Binary Source & Hamming Distortion 

The example in Section |3] illustrates that feedforward can be useful in source coding 
by using some special properties of the BEQ problem. Next, we consider a somewhat 
more complicated example to illustrate that a key idea in developing lossy compression 
algorithms for source coding with feedforward is the use of classical lossless compression 
algorithms. Specifically, we consider a binary source which is equally likely to be either 
zero or one, and we consider the Hamming distortion measure d(x,x) = \x — x\. As is 
well known, the rate-distortion function for this case is R(D) = 1 — Hf,(D) where Hb(-) 
is the binary entropy function. In the following we outline a scheme which achieves a 
distortion of D = 0.11 and rate R(D ) = 1 - tf 6 (0.11) » 1/2. 

Let C (•) and C _1 (•) be a lossless compression and decompression algorithm for a 
Bernoulli(-Do) source. Specifically, C (•) takes as input t bits with a fraction D ones 
and maps them into H^D^t pa t/2 uniformly distributed bits while C" 1 (■) maps t' 
approximately uniformly distributed bits into t' / H^Dq) pa 2t' bits with a fraction Dq 
ones. To simplify the exposition, we assume that for t' > M, these approximations are 
exact. A more careful treatment appears in Section 

The feedforward lossy compression system encoder takes a sequence of source samples, 
M(2 — 1) and encodes by producing the following codewords: 

h = x n % +1 (2) 

h = *n-3M+l © C (&i) = X^_" M+1 © C (Xn-M+l) (3) 

h = x n % 3 M M + i © e- 1 (& 2 ) = x n % 3 M M +1 © e- 1 (x- M M+ i © e- 1 (&o) (4) 

n-3M ffi p-1 ( y n-K ffi p-1 ( n \\ (r\ 

— X n-7M+1 ^ ^ (At-3M+1 ^ ^ { X n-n+l)j K°) 

■■ : (6) 

b K = xf'- 1 © e- 1 (6^) = xf"" 1 © e- 1 UfXL © e- 1 (7) 



^E*" 1 ffi p-1 / 3M2 A '- 1 ffi p-1 / 7M2 K " 2 ffi ffi p-1 ( Y n \' 

— x i ty ^ I x 2^-!-m+i ^ u I x 3M2- ff - i +i ty • • • ty ^ lA t -M+iJ 
according to the general rule 

bi = CgtliS © C_1 • ( 9 ) 

The output of the encoder is the M ■ 2 K ~ 1 bit sequence bx for the last block. 

As we see from (JZJ), bx is a description of the first block of source samples corrupted 
by the addition of C~ x (bx-i)- The decoder reconstructs the first M • 2 K ~ X source samples 
via 

xf K - % ^b K = xf"" 1 © C- 1 • (10) 

The distortion for this block is approximately Dq since, by assumption, the decompresser 
C _1 (•) maps its input to a sequence with a fraction Dq ones. The error between the 
reconstruction and the true source obtained via feedforward is a description of future 
source samples shaped by the function C(-). Thus, to reconstruct the next block, the 
decoder uses the feedforward, xf 2 , to produce 

= e (xf © b K ) = b K . x = x^ +1 © e- 1 (&*_ 2 ) . (ii) 
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Once again the distortion is approximately D since the decompresser maps its input to 
a sequence with D ones. 

The decoder proceeds in this manner and obtains a distortion of approximately D for 
each block except the last which yields no distortion. The average distortion is therefore 
roughly Dq. Since ¥l2 k ~ 1 bits are required to describe bx in encoding the K(2 K — 1) source 
samples, the average bit rate is 2 K ~ 1 /(2 K — 1) « 1/2. Thus by taking advantage of the 
source feedforward, we can obtain a point on the rate distortion curve simply by using a 
low complexity lossless compression algorithm. 

5 Finite Alphabet Sources & Arbitrary Distortion 

In this section, we generalize the construction in Section 0] to arbitrary rates, source 
distributions and distortion measures. We require two components: a lossless compres- 
sion/decompression algorithm and a shaping algorithm. Using these subsystems, we 
describe our feedforward source coding algorithm and present an analysis of its rate and 
distortion. 

5.1 Feedforward Source Coding Subsystems 

Our lossless compression and shaping algorithms must be efficient in some sense for the 
overall feedforward source coding algorithm to approach the rate-distortion function. 
Instead of delving into the details of how to build efficient compression and shaping 
algorithms, we define admissible systems to illustrate the required properties. We then 
describe how efficient subsystems can be combined. 

5.1.1 Lossless Compression Subsystem 

We define a (5, e, m) admissible lossless compression system as follows. On input of m 
samples from which are 5-strongly typical 2 according to the distribution p$, the compres- 
sor, denoted (•), returns m ■ H(x) + e bits. If the input is not 5-strongly typical, the 
output is undefined. The corresponding decompresser, Cr 1 (•) takes the resulting bits 
and reproduces the original input. 

5.1.2 Shaping Subsystem 

We define a (5, e, m) admissible shaping system as follows. On input of a sequence of m 
bits, and a semi-infinite sequence of samples, x^ which is <5-strongly typical according to 
the distribution p x , the shaper §^| x (■) returns a sequence of m! = m ■ [H (x) / H (x|x)] + e 
samples, x{™ , such that (xj™ , x™ ) is 5-strongly typical according to the distribution p% x . 
If the input is not 5-strongly typical, the output is undefined. The corresponding deshaper 
takes the pair of sequences x™ and xf 1 as input and returns the original sequence of m 
bits. 

The compression and shaping systems described previously are fixed-to-variable and 
variable-to-fixed systems respectively. Hence, for notational convenience we define the 
corresponding length functions £ (C* (■)) and £ (§x| x (■)) as returning the length of their 
respective arguments. 

2 A sequence is <5-strongly typical if the empirical fraction of occurrences of each possible outcome 
differs by at most S from the expected fraction of outcomes and no probability zero outcomes occur. 
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5.1.3 Efficient Shaping and Compression Systems 

We call a lossless compression system or a shaping system efficient if both 5 and e can 
be made arbitrarily small for m large enough. Efficient lossless compression systems can 
be implemented in a variety of ways. For example, arithmetic coding is one well-known 
approach. Perhaps less well-known is that shaping systems can also be implemented via 
arithmetic coding [12]. Specifically, by using the decompresser for an arithmetic code as 
a shaper, we can map a sequence of bits into a sequence with an arbitrary distribution. 
The compressor for the arithmetic code takes the resulting sequence and returns the 
original bit sequence. 

5.2 Feedforward Encoder and Decoder 

Since the encoder for our feedforward lossy compression system is based on a variable- 
to- fixed shaper and a fixed-to- variable compressor, it is a variable-to- variable system. In 
practice, one could use buffering, padding, or other techniques to account for this when 
encoding a fixed length source or when required to produce a fixed length encoding. We 
do not address this issue further here. Instead, we assume that there is a nominal source 
block size parameter, N, and buffering, padding, look-ahead, etc. is used to ensure that 
the system encodes N source samples (or possibly slightly more or less). Also, we assume 
that there is a minimum block size parameter, M, which may be chosen to achieve an 
efficient shaping or lossless compression subsystem. 

Once N and M are fixed, the feedforward encoder takes as input a stream of inputs 
x^ and encodes it as described in Tab. [TJ The feedforward decoder takes as input 
the resulting bit string, b, and decodes it as described in Tab. El Section |U describes an 
example of the encoding and decoding algorithm with a shaper (denoted C~ x (•)) mapping 
uniform bits to Bernoulli(O.ll) bits. This example does not require a compressor because 
the p x distribution is incompressible. 



Table 1: The Feedforward Encoder. 



1 


Initialize T = 1, L = M, and reverse the input so that in the following x™ = 


x 1 


2 


Take the block of source samples x^ +L and generate a "noisy version" 
generating each Xj from the corresponding Xj according to p x \ x ). 


x£ +L (e.g., by 


3 


while L + T < N do 




4 


Compress x T to obtain the bit sequence b = C x ( Xp J . 




5 


T^T+L+1 




6 


L^L(S x{x (6,xfP)). 




7 

8 


^ +i -§x|x (&,x?P) 
end while 




9 


return C x (x£ +L ^J 





5.3 Rate-Distortion Analysis 

Theorem 1. By using efficient lossless compression and shaping subsystems, the distor- 
tion in encoding an i.i.d. sequence generated according to p x can be made to approach 
E[d(x,x)) as closely as desired. 

Proof. First we note that by assumption, we can choose M large enough so that the 
probability of the source sequence being non- typical can be made negligible. For a typical 
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Table 2: The Feedforward Decoder. 



1: Initialize T to the length of the sequence encoded in b. 
2: while T > 1 do 

3: L <— £ (6)) 
4: T^T-L+1 
5: - C7 1 (6) 

6: Get x^ +L via the feedforward information 



7: 6 <- S'_, 



1 c T + L „T+L 



end while 

return the reversed version of x™ 



source sequence, we can focus on how the encoder maps x» to >q since the decoder simply 
maps a bit sequence to the x« sequence chosen by the encoder. The encoder maps blocks 
of source samples, x^ +i , to blocks of quantized samples, x^ +L , by using an admissible 
shaping algorithm. As described in Section l5.1.2l the shaper produces a 5-strongly typical 
sequence. Thus the total expected distortion is at most 

E[d(x, x)) + d max ■ 5 + <i max • Pr[x x n not typical] (12) 

where the first two terms are the distortion for a typical sequence produced by the shaper 
and the last term is the contribution from a non-typical source sequence. □ 

Theorem 2. By using efficient lossless compression and shaping subsystems, the rate in 
encoding an i.i.d. sequence generated according to p x can be made to approach J(x;x) as 
closely as desired. 

Proof. Imagine that the parameter N is chosen so that K passes of the loop in the encoding 
algorithm are executed. Also, let Lj denote the value of L in line 3 of the encoder in 
the jth pass. We know L\ = M by construction. By definition of an admissible shaping 
system in Section 15. 1.21 and line 6 of the encoder we have that Lj+i > Lj ■ [H(x) / H{x\x)\. 
Using this relation and assuming that each block of length Lj is typical, we can compute 
the total number of samples encoded via 



j=l j=0 



H x) 



\H(x) H(xx)Y , 

M • - y " — . (13) 

H(x) H(x x)-1 1 ; 



H(x\x) 

The bit rate required to encode these samples is 

R = L K - H(x) + e < M • H{x) ■ [H(x) /H^x)}*- 1 + e ■ K ■ [H{x) /H(x\x)] K . (14) 

This follows by the assumption that the admissible lossless compression system in Sec- 
tion requires m ■ H(x) + e bits to encode a block of m typical samples. 

Therefore the number of bits per sample when the source blocks are typical is obtained 



S 



by dividing (fT3jl by (fT3|) to obtain 



i2/n < < U-H(x) 



H(x) 
H(x\x) 



K-l 



+ 6-K 



H(x\x) 




[H{x)/H{x\x)] K - 
H(x)/H(x\x) - 1 





\. H{x\x)-\ 


\h{x) 






[ H{x) \ 



+ 



e_K_ 
M 



H(x\x) 




m 

H(x\x) 



I(x;x). 1 




g(g 
#(x|x) 



-1 -if' 



(15) 
(16) 

(17) 



An extra term must also be added to account for the possibility that the source is atypical. 
By assumption we can choose N so that K is large enough to make the second term in 
braces negligible, and then we can choose M so that the probability of any source block 
being typical is negligible. Also, by making M large enough we can make the first term 
in curly braces negligible. Thus the bit rate can be made as close to /(x; x) as desired. 

□ 

Combining the previous theorems indicates that we can approach the feedforward 
rate-distortion function with only the complexity required for lossless compression and 
shaping systems. 

Corollary 1. When linear complexity admissible lossless compression and shaping sys- 
tems are used, the resulting feedforward rate- distortion function can be approached arbi- 
trarily closely with linear complexity. 

In particular, we can use the lossless compression and shaping systems described in 
[12] which are based on arithmetic coding and the dual of arithmetic coding respectively. 



6 Concluding Remarks 

In this paper we describe a lossy compression algorithm to encode a finite-alphabet source 
in the presence of feedforward information. In particular, we show that although memo- 
ryless feedforward does not change the rate-distortion function, it allows us to construct a 
low complexity lossy compression system which approaches the rate-distortion function. 
In practice, the particular scheme described here may require modifications and other 
methods of using feedforward information or similar knowledge may be more appropriate. 
Our main goal therefore is not necessarily to advocate a particular scheme but to show 
that when compression, observation, and control interact, additional resources such as 
feedforward may provide advantages not available in the classic compression framework. 
One interesting possibility for future work includes studying the general problem in Sec- 
tion |21 when the fixed lag side information, w, is not exactly the same as the source. 
Similarly, investigating the effects of memory in the source and different values for the 
delay, A, would also be valuable. 

A Information/Operational R(D) Equivalence 

Proposition 1. The information/ operational feedforward rate- distortion functions are 
equal. 
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Proof. Since the decoder must deterministically produce >q from x\ 1 and the nR bits 
produced by the encoder we have 

n n n 

nR > ^H&lxt 1 ) > [ H &\ x f l ) ~ H (*i\4)] = Yl ~ ff(*|*T\Xi)] 

i=l i=l i=l 

n n 

( = } [ H (><i) ~ Hixtlxt 1 ^)} > [H(xi) ~ H(xt\^)] 

i=l i=l 

where (a) follows since the source is memoryless and (b) follows since conditioning reduces 
entropy. From this point standard convexity arguments establish that (JTJ) is a lower bound 
to the average rate. □ 
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